67 research outputs found

    Why do people (not) like me?: Mining opinion influencing factors from reviews

    Get PDF
    Feedback, without doubt, is a very important mechanism for companies or political parties to re-evaluate and improve their processes or policies. In this paper, we propose opinion influencing factors (OIFs) as a means to provide feedback about what influences the opinions of people. We also describe a methodology to mine OIFs from textual documents with the intention to bring a new perspective to the existing recommendation systems by concentrating on service providers (or policy makers) rather than customers. This new perspective enables one to discover the reasons why people like or do not like something by learning relationships among the traits/products via semantic rules and the factors that lead to change on the opinions such as from positive to negative. As a case study we target the healthcare domain, and experiment with the patients’ reviews on doctors. Experimental results show the gist of thousands of comments on particular aspects (also called as factors) associated with semantic rules in an e↵ective way

    Secret charing vs. encryption-based techniques for privacy preserving data mining

    Get PDF
    Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach

    Discovering private trajectories using background information

    Get PDF
    Trajectories are spatio-temporal traces of moving objects which contain valuable information to be harvested by spatio-temporal data mining techniques. Applications like city traffic planning, identification of evacuation routes, trend detection, and many more can benefit from trajectory mining. However, the trajectories of individuals often contain private and sensitive information, so anyone who possess trajectory data must take special care when disclosing this data. Removing identifiers from trajectories before the release is not effective against linkage type attacks, and rich sources of background information make it even worse. An alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved. This way, the actual trajectories are not released, but the distance information can still be used for data mining techniques such as clustering. In this paper, we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes. The background knowledge is in the form of known trajectories and extra information such as the speed limit. We provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories. Experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds

    Towards trajectory anonymization: a generalization-based approach

    Get PDF
    Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques

    Privacy risks in trajectory data publishing: reconstructing private trajectories from continuous properties

    Get PDF
    Location and time information about individuals can be captured through GPS devices, GSM phones, RFID tag readers, and by other similar means. Such data can be pre-processed to obtain trajectories which are sequences of spatio-temporal data points belonging to a moving object. Recently, advanced data mining techniques have been developed for extracting patterns from moving object trajectories to enable applications such as city traffic planning, identification of evacuation routes, trend detection, and many more. However, when special care is not taken, trajectories of individuals may also pose serious privacy risks even after they are de-identified or mapped into other forms. In this paper, we show that an unknown private trajectory can be reconstructed from knowledge of its properties released for data mining, which at first glance may not seem to pose any privacy threats. In particular, we propose a technique to demonstrate how private trajectories can be re-constructed from knowledge of their distances to a bounded set of known trajectories. Experiments performed on real data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds

    Privacy preserving spatio-temporal clustering on horizontally partitioned data

    Get PDF
    Time-stamped location information is regarded as spatio-temporal data and, by its nature, such data is highly sensitive from the perspective of privacy. In this paper, we propose a privacy preserving spatio-temporal clustering method for horizontally partitioned data which, to the best of our knowledge, was not done before. Our methods are based on building the dissimilarity matrix through a series of secure multi-party trajectory comparisons managed by a third party. Our trajectory comparison protocol complies with most trajectory comparison functions and complexity analysis of our methods shows that our protocol does not introduce extra overhead when constructing dissimilarity matrix, compared to the centralized approach. This work was funded by the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under IST-014915 GeoPKDD project

    Privacy perception and information technology utilization of high school students

    Get PDF
    Mobile technologies are commonly used and are important by high school students, since teens ages 14 to 17 usethese open platforms to share information, communication and construction of their desired cyber identity.Accompanying technology for related data privacy within implementing educational applications is yet to bedeveloped. This research was designed to investigate the perceptions of data privacy and the protection of per-sonal data of high school students who are surrounded by the Internet, social media and technology. Theperception of high school students' personal data privacy survey was developed and conducted with 1065 highschool students (9th grades). The study presentsfive main themes: (1) ownership and utilization of differenttechnologies and password sharing, (2) Internet utilization and perception of privacy, (3) social media utilizationand perception of personal privacy on social media, (4) knowledge level and perception of personal data con-servation, (5) Information technology utilization. High school students have a personal data privacy algorithm butpersons or institutions outside this algorithm are perceived as a threat to their personal data and are rejected. Thisresearch suggests developing practices and techniques to overcome students' concerns about privacy risks thatresult from the collection and sharing personal data

    Flexible fair and collusion resistant pseudonym providing system

    Get PDF
    In service providing systems, user authentication is required for different purposes such as billing, restricting unauthorized access, etc., to protect the privacy of users, their real identities should not be linked to the services that they use during authentication. A good solution is to use pseudonyms as temporary identities. On the other hand, it may also be required to have a backdoor in pseudonym systems for identity revealing that can be used by law enforcement agencies for legal reasons. Existing systems that retain a backdoor are either punitive (full user anonymity is revealed), or they are restrictive by revealing only current pseudonym identity of. In addition to that, existing systems are designed for a particular service and may not fit into others. In this paper, we address this gap and we propose a novel pseudonym providing and management system. Our system is flexible and can be tuned to fit into services for different service providers. The system is privacy-preserving and guarantees a level of anonymity for a particular number of users. Trust in our system is distributed among all system entities instead of centralizing it into a single trusted third party. More importantly, our system is highly resistant to collusions among the trusted entities. Our system also has the ability to reveal user identity fairly in case of a request by law enforcement. Analytical and simulation based performance evaluation showed that Collusion Resistant Pseudonym Providing System (CoRPPS) provides high level of anonymity with strong resistance against collusion attacks

    Predicting worker disagreement for more effective crowd labeling

    Get PDF
    Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling – a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification

    SU-Sentilab : a classification system for sentiment analysis in twitter

    Get PDF
    Sentiment analysis refers to automatically extracting the sentiment present in a given natural language text. We present our participation to the SemEval2013 competition, in the sentiment analysis of Twitter and SMS messages. Our approach for this task is the combination of two sentiment analysis subsystems which are combined together to build the final system. Both subsystems use supervised learning using features based on various polarity lexicon
    corecore